Add UBB power and power_limit fields to npm_info for MI350X baseboard power monitoring (SWDEV-567812)#3262
Open
koushikbillakanti-amd wants to merge 1 commit intodevelopfrom
Open
Conversation
… power monitoring (SWDEV-567812)
oliveiradan
reviewed
Feb 14, 2026
| } | ||
|
|
||
|
|
||
| rsmi_status_t get_ubb_power(const std::string &board_path, uint64_t *power) { |
Contributor
There was a problem hiding this comment.
We seem to have some duplicated work when looking at:
get_ubb_power()get_ubb_power_limit()
The duplicated part could be extracted into a helper function
oliveiradan
reviewed
Feb 14, 2026
| status = "DISABLED" if status == amdsmi_interface.amdsmi_wrapper.AMDSMI_NPM_STATUS_DISABLED else "ENABLED" | ||
| npm_dict.update({"status": status}) | ||
| # Add UBB power info if available (not UINT64_MAX sentinel) | ||
| if ubb_power != "N/A" and ubb_power != 0xFFFFFFFFFFFFFFFF: |
Contributor
There was a problem hiding this comment.
I recommend defining a name constant for 0xFFFFFFFFFFFFFFFF then use it where needed.
oliveiradan
reviewed
Feb 14, 2026
Contributor
oliveiradan
left a comment
There was a problem hiding this comment.
@koushikbillakanti-amd,
I added a few comments for you to take a look at.
@marifamd, @gabrpham,
Please, take a look at the CLI changes when you get a few mins.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Add support for UBB (baseboard) power monitoring on MI350X as required for large cluster GPU workload health monitoring. This enables reading real-time baseboard power consumption and power limit thresholds via AMDSMI.
Technical Details
Extended rsmi_npm_info_t and amdsmi_npm_info_t structures with ubb_power and ubb_power_limit fields (ABI-compatible using reserved space). Added get_ubb_power() and get_ubb_power_limit() functions in rocm_smi_npm.cc to read from sysfs baseboard_power and baseboard_power_limit files. Updated Python wrapper and CLI to expose new fields.
JIRA ID
SWDEV-567812
Test Plan
Built and tested on MI350X hardware with live UBB sysfs files. Verified Python API returns correct values via amdsmi_get_npm_info(). Confirmed graceful NOT_SUPPORTED handling for processors without UBB capability.
Test Result
Successfully reads UBB Power (2127W) and UBB Power Limit (8400W) from MI350X card33. Build passes with no warnings. All 8 processors handled correctly with proper error handling for unsupported devices.
Submission Checklist